Skip to content

feat: clean the config and fix the manifests generation#1953

Open
cw-Guo wants to merge 28 commits into
fluent:masterfrom
cw-Guo:feat/clean-config
Open

feat: clean the config and fix the manifests generation#1953
cw-Guo wants to merge 28 commits into
fluent:masterfrom
cw-Guo:feat/clean-config

Conversation

@cw-Guo
Copy link
Copy Markdown
Collaborator

@cw-Guo cw-Guo commented May 16, 2026

What this PR does / why we need it:

Cleans up config/ and consolidate the manifests/setup/setup.yaml, config/default/ now is the single canonical kustomize entry point for the production install bundle, and also fixes the kubebuilder markers to generate the wanted roles and so on.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduced a user-facing change?


Additional documentation, usage docs, etc.:


Copilot AI review requested due to automatic review settings May 16, 2026 04:37
@cw-Guo cw-Guo changed the title Feat/clean config feat: clean the config and fix the manifests generation May 16, 2026
@cw-Guo cw-Guo force-pushed the feat/clean-config branch from 0b0b396 to 7516867 Compare May 16, 2026 04:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@joshuabaird joshuabaird requested a review from Copilot May 19, 2026 14:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 56 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (10)

config/rbac/kustomization.yaml:1

  • The comment states leader-election RBAC is 'intentionally disabled' and that 'the production deployment does not enable --leader-elect', but the rendered Deployment in manifests/setup/setup.yaml (and config/manager/manager.yaml) passes --leader-elect, and these two leader-election resources are actively included (not commented out). Either remove the contradictory comment or rewrite it to reflect that leader-election RBAC is enabled to match the manager args.
    tests/scripts/fluentd_e2e.sh:1
  • Using a literal \\n in the sed replacement to inject a new imagePullPolicy: line is a GNU sed extension; BSD/macOS sed will emit a literal n instead of a newline and produce invalid YAML, breaking the script for developers running e2e locally on macOS. Consider either using two separate sed expressions where the second uses a\\ to append a line, piping through awk, or constructing the multi-line replacement using a literal newline inside the shell string ($'\\n'). At minimum, document the GNU-sed dependency.
    manifests/setup/setup.yaml:1
  • The manager is started with --metrics-bind-address=:8443 but the container declares no containerPort for 8443 and the metrics Service / NetworkPolicy / ServiceMonitor sections in config/default/kustomization.yaml are all commented out. The metrics endpoint will therefore be unreachable from outside the pod. If metrics aren't intended to be exposed in the default bundle, consider setting --metrics-bind-address=0 (disabled) to avoid binding an unused port; if they are intended, expose the port and enable the metrics Service.
    manifests/setup/setup.yaml:1
  • readOnlyRootFilesystem: true is set, but the manager binary currently writes leader-election cache files (controller-runtime writes to its temp dir) and may rely on /tmp. Without an emptyDir mount at /tmp (only the env configmap is mounted under /fluent-operator), the pod can fail to acquire the leader lease or to perform any operation that touches the filesystem. Consider adding an emptyDir volume mounted at /tmp to keep the read-only root while preserving writable scratch space.
    config/rbac/role_binding.yaml:1
  • The ServiceAccount subject no longer specifies a namespace. For a ClusterRoleBinding, the subject's namespace is required when kind: ServiceAccount; kustomize's namespace transformer does set it for ServiceAccount resources, but it does not by default rewrite subject namespaces inside (Cluster)RoleBindings unless configured to. Verify the rendered setup.yaml actually contains namespace: fluent under this subject (looking at the diff for setup.yaml, the ClusterRoleBinding still references namespace: fluent, so kustomize is handling it — please confirm this stays true if a consumer overrides the namespace in manifests/setup/kustomization.yaml).
    config/manager/manager.yaml:1
  • Quoting ALL is unnecessary and inconsistent with the rendered manifests/setup/setup.yaml (which emits - ALL unquoted). Drop the quotes to keep source and generated forms identical and avoid spurious diffs on regeneration.
    controllers/fluentdconfig_controller.go:1
  • The previous markers granted get;update;patch on fluentdconfigs/status and update on fluentdconfigs/finalizers. The new consolidated marker on line 90 covers */status but the /finalizers permission has been dropped entirely. If the controller still adds/removes finalizers on fluentdconfigs or clusterfluentdconfigs (the previous code did via the update verb on /finalizers), reconciliation will fail with a forbidden error. Please verify whether finalizer handling was removed; if not, restore a +kubebuilder:rbac:groups=fluentd.fluent.io,resources=clusterfluentdconfigs/finalizers;fluentdconfigs/finalizers,verbs=update marker.
    controllers/fluentbit_controller.go:1
  • The marker core,resources=events,verbs=list;watch was removed here. If the controller's manager still records Events (controller-runtime's default event recorder calls create/patch on events) or any reconciler watches them, this will produce permission errors. The rendered ClusterRole in setup.yaml no longer grants events at all (only the new leader-election Role grants events: create;patch in the fluent namespace, which won't cover cluster-scoped consumers). Please confirm event recording still works at runtime, and if so, retain at least events,verbs=create;patch on the operator's ClusterRole or namespaced Role as appropriate.
    tests/scripts/fluentd_e2e.sh:1
  • The VERSION variable (previously read from the VERSION file) was removed, but if other parts of this script or downstream tooling referenced $VERSION, those references will now be unbound. Please confirm nothing else in the file (not shown in the diff) still uses $VERSION.
    manifests/setup/setup.yaml:1
  • The leader-election Role/RoleBinding are hard-coded to namespace: fluent. If a user customizes the install namespace via manifests/setup/kustomization.yaml (the comment in that file suggests doing so), this resource will be left in fluent while the Deployment moves to the new namespace, and leader election will fail with a forbidden error on leases. Confirm the kustomize namespace transformer rewrites these as expected, and document the requirement to regenerate (make manifests) after changing namespaces.

Comment thread Makefile
Comment thread config/default/kustomization.yaml
Comment thread .github/workflows/release-tool.yaml Outdated
Comment thread config/crd/kustomization.yaml Outdated
Copilot AI review requested due to automatic review settings May 20, 2026 03:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 54 out of 57 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

config/rbac/role_binding.yaml:15

  • subjects for a ClusterRoleBinding that targets a ServiceAccount should include namespace. With it omitted, applying this YAML directly would bind to a ServiceAccount in an empty/unknown namespace (effectively granting no permissions). If you intend to rely on kustomize’s namespace transformer to inject it, consider making that explicit (or set namespace: fluent here) to avoid standalone usage pitfalls.

Comment thread config/manager/manager.yaml
Comment thread config/default/kustomization.yaml Outdated
Comment thread config/rbac/kustomization.yaml Outdated
Comment thread config/manager/manager.yaml Outdated
name: health
protocol: TCP
securityContext:
readOnlyRootFilesystem: true
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an emptyDir at /tmp for controller-runtime's leader election? Something like:

volumeMounts:
  - name: tmp
    mountPath: /tmp
volumes:
  - name: tmp
    emptyDir: {}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Controller-runtime's leader election uses Kubernetes Lease objects (coordination.k8s.io/v1) via the API server — it doesn't write to local disk. No /tmp volume is required for it.

Comment thread config/default/kustomization.yaml
@cw-Guo cw-Guo force-pushed the feat/clean-config branch from 2d015b5 to fbde416 Compare May 23, 2026 03:08
cw-Guo added 2 commits May 22, 2026 20:09
Signed-off-by: Chengwei Guo <cwguoz@gmail.com>
Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
Signed-off-by: Chengwei Guo <cwguoz@gmail.com>
Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
@cw-Guo cw-Guo force-pushed the feat/clean-config branch from fbde416 to b75c967 Compare May 23, 2026 03:11
Copilot AI review requested due to automatic review settings May 23, 2026 03:11
Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 56 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (2)

tests/scripts/fluentd_e2e.sh:83

  • start_fluent_operator pipes manifests/setup/setup.yaml into kubectl create, but setup.yaml now contains a Namespace fluent manifest. Since the script also creates the namespace in prepare_cluster, this will fail with an AlreadyExists error (and abort due to errexit). Consider either removing the explicit namespace creation in prepare_cluster, switching this to kubectl apply, or filtering the Namespace document out of setup.yaml for the e2e path.
    config/rbac/role_binding.yaml:15
  • This ClusterRoleBinding’s ServiceAccount subject is missing namespace. Without it, the manifest is invalid when applied directly (the API requires subjects[].namespace for ServiceAccount subjects). Add the namespace here (and let kustomize override it if needed) to keep the RBAC manifests self-contained and valid.

Comment thread tests/scripts/fluentd_e2e.sh
Comment thread config/default/kustomization.yaml Outdated
Comment thread Makefile
Comment thread config/rbac/kustomization.yaml
Comment thread manifests/setup/setup.yaml Outdated
cw-Guo added 4 commits May 22, 2026 20:42
Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
@cw-Guo cw-Guo force-pushed the feat/clean-config branch from 3b688a1 to eb4acb7 Compare May 23, 2026 04:01
…ace before install

Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
Copilot AI review requested due to automatic review settings May 23, 2026 04:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

cw-Guo added 2 commits May 22, 2026 21:32
Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Wire up the HTTPS :8443 metrics endpoint that the operator already serves
by default: enable metrics-auth RBAC so TokenReview/SAR succeed, expose
the metrics Service, and align its selector with the operator pod labels.

Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
@cw-Guo cw-Guo marked this pull request as draft May 23, 2026 04:58
- leader_election_role: drop unused configmaps verbs (controller-runtime
  uses Leases by default).
- network-policy/prometheus monitor: fix podSelector / matchLabels to
  match the operator pod labels so endpoints actually resolve.
- manager_metrics_patch: append the metrics flag instead of inserting
  at args/0 to be robust against arg reordering.
- RBAC: add consistent app.kubernetes.io labels to leader-election and
  metrics_* roles/bindings; document that hardcoded subject namespaces
  are rewritten by kustomize overlays.

Signed-off-by: Chengwei Guo <chengweiguo@bytedance.com>
@cw-Guo cw-Guo force-pushed the feat/clean-config branch from 7d414b8 to f39ab63 Compare May 23, 2026 05:05
@cw-Guo cw-Guo marked this pull request as ready for review May 23, 2026 05:22
Copilot AI review requested due to automatic review settings May 23, 2026 05:22
@cw-Guo
Copy link
Copy Markdown
Collaborator Author

cw-Guo commented May 23, 2026

@joshuabaird please help review again. Thanks.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants